Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve perf of Utf8Parser.TryParse for int64 and uint64 #52423

Merged
merged 1 commit into from
Jul 12, 2021

Conversation

GrabYourPitchforks
Copy link
Member

@GrabYourPitchforks GrabYourPitchforks commented May 7, 2021

Somewhat inspired by investigating #52314 and seeing a little bit of low-hanging fruit that we can address.

Perf results (basically from running this benchmark with some additional inputs):

Method Toolchain value Mean Error StdDev Ratio
TryParseInt64 baseline -1 3.73 ns 0.022 ns 0.018 ns 1.00
TryParseInt64 compare -1 3.25 ns 0.025 ns 0.021 ns 0.87
TryParseInt64 baseline -12345 5.93 ns 1.680 ns 0.092 ns 1.00
TryParseInt64 compare -12345 5.85 ns 1.097 ns 0.060 ns 0.99
TryParseInt64 baseline -12345678901234567 15.75 ns 43.214 ns 2.368 ns 1.00
TryParseInt64 compare -12345678901234567 13.24 ns 0.371 ns 0.020 ns 0.85
TryParseInt64 baseline -9223372036854775808 19.35 ns 0.326 ns 0.305 ns 1.00
TryParseInt64 compare -9223372036854775808 17.12 ns 0.085 ns 0.076 ns 0.88
TryParseInt64 baseline 0 3.17 ns 0.097 ns 0.005 ns 1.00
TryParseInt64 compare 0 3.21 ns 0.602 ns 0.033 ns 1.01
TryParseInt64 baseline 1 3.27 ns 0.060 ns 0.003 ns 1.00
TryParseInt64 compare 1 3.21 ns 0.322 ns 0.017 ns 0.98
TryParseInt64 baseline 12345 5.76 ns 0.738 ns 0.040 ns 1.00
TryParseInt64 compare 12345 5.07 ns 0.929 ns 0.050 ns 0.88
TryParseInt64 baseline 12345678901234567 13.90 ns 0.247 ns 0.013 ns 1.00
TryParseInt64 compare 12345678901234567 12.85 ns 1.504 ns 0.082 ns 0.92
TryParseInt64 baseline 9223372036854775807 17.72 ns 0.102 ns 0.095 ns 1.00
TryParseInt64 compare 9223372036854775807 16.11 ns 0.063 ns 0.059 ns 0.91

Some of the benchmarks were going a bit screwy on my machine, so I basically ran them one at a time and concatenated all of the results into a single table. Not sure what was up with my environment.

The tl;dr is that this shouldn't have any worse perf than the existing code, and the total codegen ends up being 448 bytes smaller (on x64) compared to baseline across the two methods.

@ghost
Copy link

ghost commented May 7, 2021

Tagging subscribers to this area: @tannergooding, @pgovind, @GrabYourPitchforks
See info in area-owners.md if you want to be subscribed.

Issue Details

Somewhat inspired by investigating #52314 and seeing a little bit of low-hanging fruit that we can address.

Perf results (basically from running this benchmark with some additional inputs):

| Method | Toolchain | value | Mean Error | StdDev | Ratio |
|-------------- |---------- |--------------------- |---------:|----------:|---------:|------:|
| TryParseInt64 | baseline | -1 | 3.73 ns | 0.022 ns | 0.018 ns | 1.00 |
| TryParseInt64 | compare | -1 | 3.25 ns | 0.025 ns | 0.021 ns | 0.87 |
| | | | | | | |
| TryParseInt64 | baseline | -12345 | 5.93 ns | 1.680 ns | 0.092 ns | 1.00 |
| TryParseInt64 | compare | -12345 | 5.85 ns | 1.097 ns | 0.060 ns | 0.99 |
| | | | | | | |
| TryParseInt64 | baseline | -12345678901234567 | 15.75 ns | 43.214 ns | 2.368 ns | 1.00 |
| TryParseInt64 | compare | -12345678901234567 | 13.24 ns | 0.371 ns | 0.020 ns | 0.85 |
| | | | | | | |
| TryParseInt64 | baseline | -9223372036854775808 | 19.35 ns | 0.326 ns | 0.305 ns | 1.00 |
| TryParseInt64 | compare | -9223372036854775808 | 17.12 ns | 0.085 ns | 0.076 ns | 0.88 |
| | | | | | | |
| TryParseInt64 | baseline | 0 | 3.17 ns | 0.097 ns | 0.005 ns | 1.00 |
| TryParseInt64 | compare | 0 | 3.21 ns | 0.602 ns | 0.033 ns | 1.01 |
| | | | | | | |
| TryParseInt64 | baseline | 1 | 3.27 ns | 0.060 ns | 0.003 ns | 1.00 |
| TryParseInt64 | compare | 1 | 3.21 ns | 0.322 ns | 0.017 ns | 0.98 |
| | | | | | | |
| TryParseInt64 | baseline | 12345 | 5.76 ns | 0.738 ns | 0.040 ns | 1.00 |
| TryParseInt64 | compare | 12345 | 5.07 ns | 0.929 ns | 0.050 ns | 0.88 |
| | | | | | | |
| TryParseInt64 | baseline | 12345678901234567 | 13.90 ns | 0.247 ns | 0.013 ns | 1.00 |
| TryParseInt64 | compare | 12345678901234567 | 12.85 ns | 1.504 ns | 0.082 ns | 0.92 |
| | | | | | | |
| TryParseInt64 | baseline | 9223372036854775807 | 17.72 ns | 0.102 ns | 0.095 ns | 1.00 |
| TryParseInt64 | compare | 9223372036854775807 | 16.11 ns | 0.063 ns | 0.059 ns | 0.91 |

Some of the benchmarks were going a bit screwy on my machine, so I basically ran them one at a time and concatenated all of the results into a single table. Not sure what was up with my environment.

The tl;dr is that this shouldn't have any worse perf than the existing code, and the total codegen ends up being 448 bytes smaller compared to baseline across the two methods.

Author: GrabYourPitchforks
Assignees: -
Labels:

area-System.Buffers, tenet-performance

Milestone: 6.0.0


if ((uint)firstChar == unchecked((uint)('-' - '0')))
{
sign--; // set to -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sign--; // set to -1
sign = -1;

Make it explicit?
The JIT will emit code like mov rax, 0xffffffffffffffff anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JIT emits dec rax for this scenario. I think it doesn't propagate the "if (idx != 0) { FAIL; }" assertion above for some reason.
I can set sign = -1; explicitly, but it does increase the codegen by 7 - 8 bytes. Maybe it's too much of a microoptimization to worry about and the cleaner code is better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'd keep it as is. Thanks for the info.

// If sign = -1, this becomes value = (parsedValue ^ -1) - (-1) = ~parsedValue + 1 = -parsedValue

bytesConsumed = idx;
value = ((long)parsedValue ^ sign) - sign;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

The same could be done to other types too (like int32 above, etc.)?

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Are there any similar gains to be had in {u}long.TryParse?

@jeffhandley
Copy link
Member

@GrabYourPitchforks Is this ready to merge?

@jeffhandley
Copy link
Member

@GrabYourPitchforks Is this ready to merge?

Ping on this, @GrabYourPitchforks. It would be great to get this in for Preview 7.

@GrabYourPitchforks
Copy link
Member Author

I investigated the framework's other parsing routines, but they have a bit more complexity, needing to handle whitespace and trailing nulls. Opened #55541 so we don't lose track of this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants